Development of SRI's translation systems for broadcast news and broadcast conversations

نویسندگان

  • Jing Zheng
  • Wen Wang
  • Necip Fazil Ayan
چکیده

We present our recent work on developing large-vocabulary Arabic-to-English and Chinese-to-English speech-to-text translation systems for the January 2008 Global Autonomous Language Exploitation (GALE) retest evaluation. Two audio genres were involved in the evaluation: broadcast news and broadcast conversation. Our system, following the hierarchical phrase-based translation approach, has a two-pass decoding strategy, with the first-pass integrated search generating 3000 unique n-best lists, which are then reranked by several different language models in the second pass. We emphasize our work on adapting the system, which was mostly trained on text data, to the speech genres, including number tokenization, punctuation compensation, and various optimization techniques. We present our results on several different tuning and testing data sets used for system development.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using syntax in large-scale audio document translation

Recently, the use of syntax has very effectively improved machine translation (MT) quality in many text translation tasks. However, using syntax in speech translation poses additional challenges because of disfluencies and other spoken language phenomena, and of errors introduced by automatic speech recognition (ASR). In this paper, we investigate the effect of using syntax in a large-scale aud...

متن کامل

The Effect of Broadcast Digitalization on Agricultural Information Dissemination in Nigeria.

Broadcast digitalization with its enormous benefits to the broadcasting industry will improve the quality of content of programs delivered by television stations. Africa has a switchover date of June, 2017. For Nigerians to have access to television broadcast once the switch over is completed, they must purchase high definition television sets or the set-up box. The awareness among urban dwelle...

متن کامل

An Investigation of LTE Broadcast

Broadcast and broadband communications have undoubt- edly become a part of today’s social life. Accessibility of content of interest to the audience at any place and at any time regardless of the type of content consumer device can have an effective contribution to the desire of the audience to use of the broadcast content. The HD and Ultra HD qualities, the desire for demand-driven application...

متن کامل

An Investigation of LTE Broadcast

Broadcast and broadband communications have undoubt- edly become a part of today’s social life. Accessibility of content of interest to the audience at any place and at any time regardless of the type of content consumer device can have an effective contribution to the desire of the audience to use of the broadcast content. The HD and Ultra HD qualities, the desire for demand-driven application...

متن کامل

An Analysis of Sentence Segmentation Features for Broadcast News, Broadcast Conversations, and Meetings

Information retrieval techniques for speech are based on those developed for text, and thus expect structured data as input. An essential task is to add sentence boundary information to the otherwise unannotated stream of words output by automatic speech recognition systems. We analyze sentence segmentation performance as a function of feature types and transcription (manual versus automatic) f...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008